1 Background and Introduction

Regression is very well designed for understanding how a continuous variable predicts another continuous variable.

We have to employ special procedures if we are to understand how a categorical variable predicts another continuous variable.

2 Research Question

What is the association of city wide diversity and region of the country with neighborhood level diversity?

Data Source: New York Times 538 Blog

  • Given that–as we will see–Citywide Diversity does not perfectly predict Neighborhood Diversity, can we get an estimate of the degree to which the former affects the latter?
  • How much does Region of the Country contribute to these dynamics?

3 Look at the Data

4 Plot the Data

The orange line below is a theory driven line..

The red line below is a regression line.

5 Map of Cities

Color Region
green West
red SouthEast
blue NorthEast
purple Midwest

6 Region Specific Regression Lines

6.1 Neighborhood Diversity as a Function of City Wide Diversity by Region of Country

7 Create Indicator Variables

Essentially, we want to create a set of yes/no indicator variables, for each value of the categorical variable. SPSS has a function to automatically do this.

creating indicator variables

creating indicator variables

8 Regression Model

\[NeighborhoodDiversity = \beta_0 + \beta CitywideDiversity + \beta NorthEast + \beta SouthEast + \beta West + e_i\]

  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01911 0.02766 -0.6908 0.4914
CITYWIDE_DIVERSITY_INDEX 0.7457 0.04407 16.92 2e-30
REGIONnortheast 0.003935 0.01957 0.2011 0.8411
REGIONsoutheast 0.02858 0.01621 1.763 0.08115
REGIONwest 0.08945 0.01595 5.606 2.019e-07
Fitting linear model: NEIGHBORHOOD_DIVERSITY_INDEX ~ CITYWIDE_DIVERSITY_INDEX + REGION
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
100 0.05206 0.7915 0.7827